{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "23cf319b",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/WeaviateIndexDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "307804a3-c02b-4a57-ac0d-172c30ddc851",
   "metadata": {},
   "source": [
    "# Weaviate Vector Store"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5508d8ac",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bac8f172",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-vector-stores-weaviate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b05d5956",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "#### Creating a Weaviate Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08ad68ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import openai\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eccceb71",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72a4b618-668d-4713-84c5-6362030e9f19",
   "metadata": {},
   "outputs": [],
   "source": [
    "import weaviate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad860554",
   "metadata": {},
   "outputs": [],
   "source": [
    "# cloud\n",
    "cluster_url = \"\"\n",
    "api_key = \"\"\n",
    "\n",
    "client = weaviate.connect_to_wcs(\n",
    "    cluster_url=cluster_url,\n",
    "    auth_credentials=weaviate.auth.AuthApiKey(api_key),\n",
    ")\n",
    "\n",
    "# local\n",
    "# client = connect_to_local()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "#### Load documents, build the VectorStoreIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.vector_stores.weaviate import WeaviateVectorStore\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0cf37644",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b7da523",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba1558b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import StorageContext\n",
    "\n",
    "# If you want to load the index later, be sure to give it a name!\n",
    "vector_store = WeaviateVectorStore(\n",
    "    weaviate_client=client, index_name=\"LlamaIndex\"\n",
    ")\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")\n",
    "\n",
    "# NOTE: you may also choose to define a index_name manually.\n",
    "# index_name = \"test_prefix\"\n",
    "# vector_store = WeaviateVectorStore(weaviate_client=client, index_name=index_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9add1bd1",
   "metadata": {},
   "source": [
    "#### Using a custom batch configuration\n",
    "\n",
    "Llamaindex defaults to Weaviate's dynamic batching, optimized for most common scenarios. However, in low-latency setups, this can overload the server or max out any GRPC Message limits in place. For more control and a better ingestion process, consider adjusting batch size by using the fixed size batch.\n",
    "\n",
    "\n",
    "Here is how you can fine tune WeaviateVectorStore and define a custom batch:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "936caf33",
   "metadata": {},
   "outputs": [],
   "source": [
    "from weaviate.classes.config import ConsistencyLevel\n",
    "\n",
    "custom_batch = client.batch.fixed_size(\n",
    "    batch_size=123,\n",
    "    concurrent_requests=3,\n",
    "    consistency_level=ConsistencyLevel.ALL,\n",
    ")\n",
    "vector_store_fixed = WeaviateVectorStore(\n",
    "    weaviate_client=client,\n",
    "    index_name=\"LlamaIndex\",\n",
    "    # we pass our custom batch as a client_kwargs\n",
    "    client_kwargs={\"custom_batch\": custom_batch},\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "04304299-fc3e-40a0-8600-f50c3292767e",
   "metadata": {},
   "source": [
    "#### Query Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35369eda",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set Logging to DEBUG for more detailed outputs\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bedbb693-725f-478f-be26-fa7180ea38b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0123088b",
   "metadata": {},
   "source": [
    "## Loading the index\n",
    "\n",
    "Here, we use the same index name as when we created the initial index. This stops it from being auto-generated and allows us to easily connect back to it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3e2c654",
   "metadata": {},
   "outputs": [],
   "source": [
    "cluster_url = \"\"\n",
    "api_key = \"\"\n",
    "\n",
    "client = weaviate.connect_to_wcs(\n",
    "    cluster_url=cluster_url,\n",
    "    auth_credentials=weaviate.auth.AuthApiKey(api_key),\n",
    ")\n",
    "\n",
    "# local\n",
    "# client = weaviate.connect_to_local()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2e0997c",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = WeaviateVectorStore(\n",
    "    weaviate_client=client, index_name=\"LlamaIndex\"\n",
    ")\n",
    "\n",
    "loaded_index = VectorStoreIndex.from_vector_store(vector_store)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc9a2ad0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set Logging to DEBUG for more detailed outputs\n",
    "query_engine = loaded_index.as_query_engine()\n",
    "response = query_engine.query(\"What happened at interleaf?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3a41a70d",
   "metadata": {},
   "source": [
    "## Metadata Filtering\n",
    "\n",
    "Let's insert a dummy document, and try to filter so that only that document is returned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df6b6d46",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Document\n",
    "\n",
    "doc = Document.example()\n",
    "print(doc.metadata)\n",
    "print(\"-----\")\n",
    "print(doc.text[:100])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30b0b2e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "loaded_index.insert(doc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1bd18f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters\n",
    "\n",
    "filters = MetadataFilters(\n",
    "    filters=[ExactMatchFilter(key=\"filename\", value=\"README.md\")]\n",
    ")\n",
    "query_engine = loaded_index.as_query_engine(filters=filters)\n",
    "response = query_engine.query(\"What is the name of the file?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29a92918",
   "metadata": {},
   "source": [
    "# Deleting the index completely\n",
    "\n",
    "You can delete the index created by the vector store using the `delete_index` function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0a5b319",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.delete_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71932f10-3783-4f8d-a112-b90538d66971",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.delete_index()  # calling the function again does nothing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6eadc2c1",
   "metadata": {},
   "source": [
    "# Connection Termination"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7d8149a",
   "metadata": {},
   "source": [
    "You must ensure your client connections are closed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ac6e54e",
   "metadata": {},
   "outputs": [],
   "source": [
    "client.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
